Mirror upstream elastic/elasticsearch#133903 for AI review (snapshot of HEAD tree) #84

phananh1010 · 2025-09-30T20:59:03Z

Single commit with tree=051723c20ef007ff63b8b530a5b3805947404935^{tree}, parent=4c4e49b2e9cbae0572760a0fc38dbbd58f7fa2be. Exact snapshot of upstream PR head. No conflict resolution attempted.

Summary by CodeRabbit

New Features
- Dynamic target index selection for OTEL metrics using attributes and receiver name, with support for explicit index overrides.
- Metric documents now include data_stream metadata (type, dataset, namespace) when applicable.
Refactor
- Centralized sanitization of data stream fields for consistent normalization.
- Improved grouping and routing via a typed target index.
Tests
- Added extensive tests for index routing, receiver-based routing, defaulting, sanitization, and ignoring routing-related attributes.

BASE=4c4e49b2e9cbae0572760a0fc38dbbd58f7fa2be HEAD=051723c20ef007ff63b8b530a5b3805947404935 Branch=main

coderabbitai · 2025-09-30T20:59:10Z

Walkthrough

Centralizes data stream field sanitization in DataStream and updates RerouteProcessor to use it. Introduces TargetIndex for deriving target indexes and integrates it into OTEL data point grouping and metric document building. Adjusts visibility in MappingHints. Adds comprehensive tests for TargetIndex, grouping behavior, document data_stream output, and ignored attributes.

Changes

Cohort / File(s)	Summary
Centralized sanitization in DataStream `server/src/main/java/.../DataStream.java`	Adds public static sanitizers: sanitizeType, sanitizeDataset, sanitizeNamespace. Implements shared normalization with patterns, length cap, and replacement.
RerouteProcessor uses DataStream sanitizers `modules/ingest-common/src/main/java/.../RerouteProcessor.java`	Removes local sanitization logic and constants; delegates to DataStream::sanitizeType/dataset/namespace in DataStreamValueSource factories.
Target index derivation (new feature) `x-pack/plugin/otel-data/src/main/java/.../datapoint/TargetIndex.java`	New class encapsulating target index resolution from attributes/scope/resources and optional receiver; supports data stream defaults and explicit index. Provides helpers and accessors.
Grouping uses TargetIndex `x-pack/plugin/otel-data/src/main/java/.../datapoint/DataPointGroupingContext.java`	Changes map keys and DataPointGroup field/constructor from String to TargetIndex; extracts receiverName from scope; evaluates TargetIndex per group.
Metric document data_stream output `x-pack/plugin/otel-data/src/main/java/.../docbuilder/MetricDocumentBuilder.java`	Emits data_stream {type,dataset,namespace} when TargetIndex indicates a data stream; ignores TargetIndex-related attributes during attribute rendering.
Mapping hints key visibility `x-pack/plugin/otel-data/src/main/java/.../docbuilder/MappingHints.java`	Makes MAPPING_HINTS private; adds public isMappingHintsAttribute(String).
Tests: TargetIndex and routing `x-pack/plugin/otel-data/src/test/java/.../datapoint/TargetIndexTests.java` `.../datapoint/DataPointGroupingContextTests.java`	Adds tests for TargetIndex evaluation, receiver-based routing, grouping across indices. Validates sanitization and precedence rules.
Tests: Metric document data_stream `x-pack/plugin/otel-data/src/test/java/.../docbuilder/MetricDocumentBuilderTests.java`	Uses TargetIndex.defaultMetrics(); asserts data_stream fields in output.
Tests: Ignored attributes in TSID funnel `x-pack/plugin/otel-data/src/test/java/.../tsid/AttributeListTsidFunnelTests.java`	Adds test ensuring target-index-related keys are ignored by the funnel.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor OTELExporter as OTEL Exporter
  participant Ingest as Ingest Pipeline
  participant Group as DataPointGroupingContext
  participant Target as TargetIndex
  participant Builder as MetricDocumentBuilder
  participant ES as Elasticsearch

  OTELExporter->>Ingest: Send datapoints
  Ingest->>Group: Group datapoints by scope/resource
  Group->>Group: extract receiverName from scope name
  Group->>Target: evaluate(type, attributes, receiverName, scopeAttrs, resourceAttrs)
  Target-->>Group: TargetIndex (index or data_stream fields)
  Group->>Builder: DataPointGroup with TargetIndex
  Builder->>Builder: buildDataStream if isDataStream()
  Builder->>ES: Index document to TargetIndex.index()
  ES-->>OTELExporter: Ack

sequenceDiagram
  autonumber
  participant RP as RerouteProcessor
  participant DS as DataStream

  RP->>DS: sanitizeType(type)
  RP->>DS: sanitizeDataset(dataset)
  RP->>DS: sanitizeNamespace(namespace)
  DS-->>RP: sanitized values
  RP->>RP: proceed with routing using sanitized values

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paw—new streams aligned,
Indices hop where rules are defined.
A nibble of names, sanitized neat,
TargetIndex guides my fleet.
From scope to doc, we bound in time—
Carrots cached, mappings sublime. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.51% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title accurately describes that this pull request is mirroring the upstream elastic#133903 for AI review and indicates it is a snapshot of the HEAD tree, making it clear to reviewers what the PR’s purpose is.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch eval/pr-133903-diff

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (1)
167-179: Optional: also assert data_stream presence in empty-fields test.

Since you pass TargetIndex.defaultMetrics(), consider asserting data_stream.* exists even when resource/scope are empty, to prevent regressions.
@@
-        documentBuilder.buildMetricDocument(builder, dataPointGroup);
+        documentBuilder.buildMetricDocument(builder, dataPointGroup);
@@
-        assertThat(doc.evaluate("unit"), is(nullValue()));
+        assertThat(doc.evaluate("unit"), is(nullValue()));
+        assertThat(doc.evaluate("data_stream.type"), equalTo("metrics"));
+        assertThat(doc.evaluate("data_stream.dataset"), equalTo("generic.otel"));
+        assertThat(doc.evaluate("data_stream.namespace"), equalTo("default"));
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)
63-66: Consider honoring mapping hints instead of always passing empty.

If users provide elasticsearch.mapping.hints, propagate them to dynamic template selection.
-            String dynamicTemplate = dataPoint.getDynamicTemplate(MappingHints.empty());
+            // Parse once per group if needed; here kept local for clarity
+            String dynamicTemplate = dataPoint.getDynamicTemplate(
+                MappingHints.fromAttributes(dataPointGroup.dataPointAttributes())
+            );

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c4e49b and c345e98.

📒 Files selected for processing (10)

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (2 hunks)
server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (2 hunks)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (6 hunks)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1 hunks)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java (2 hunks)
x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (4 hunks)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContextTests.java (4 hunks)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (1 hunks)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (5 hunks)
x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/tsid/AttributeListTsidFunnelTests.java (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)

TargetIndex (21-158)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)

DataStream (87-2183)

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (1)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)

TargetIndex (21-158)

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (1)

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)

DataStream (87-2183)

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (1)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)

TargetIndex (21-158)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (1)

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (1)

TargetIndex (21-158)

🔇 Additional comments (17)

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilderTests.java (2)

26-26: Good switch to TargetIndex.defaultMetrics().

Using the shared default ensures tests track routing/index construction semantics consistently.

Also applies to: 77-78, 146-147, 178-179

106-108: Validating data_stream fields is spot on.

These assertions correctly reflect the derived defaults: type=metrics, dataset=generic.otel, namespace=default.

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MetricDocumentBuilder.java (1)

53-55: Data stream block integration looks good.

Emitting data_stream only when targetIndex.isDataStream() is true matches the routing model and keeps documents clean for plain-index targets.

Also applies to: 115-124

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/docbuilder/MappingHints.java (1)

30-31: No external usages of MAPPING_HINTS found; making it private is safe.
Tests use the literal "elasticsearch.mapping.hints", so they’re unaffected.

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/RerouteProcessor.java (1)

12-12: Centralizing sanitization approved — DataStream.sanitizeType, sanitizeDataset, and sanitizeNamespace use the same disallowed-character patterns (type & dataset: "[\\/?"<>| ,#:-]", namespace: "[\\/?"<>| ,#:]"), enforce MAX_LENGTH = 100, and replace with “_”. No behavior drift detected.

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java (1)

108-118: LGTM! Clean public API design.

The three public static sanitization methods (sanitizeType, sanitizeDataset, sanitizeNamespace) provide a clear, well-structured API for data stream field normalization. The delegation to a common private helper ensures consistency.

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContextTests.java (3)

62-87: LGTM! Test correctly validates target index-based grouping.

The test testGroupingDifferentTargetIndex effectively verifies that data points with different data_stream.dataset attributes are grouped separately and routed to distinct target indexes. The use of containsInAnyOrder is appropriate for unordered collection assertions.

244-266: LGTM! Test validates receiver-based routing.

The test testReceiverBasedRouting correctly verifies that the receiver name is extracted from the scope name and used to determine the target index. The expected index metrics-hostmetricsreceiver.otel-default aligns with the receiver name hostmetricsreceiver parsed from the scope path.

268-289: LGTM! Test covers edge case for receiver name extraction.

The test testReceiverBasedRoutingWithoutTrailingSlash validates the edge case where the scope name ends immediately after the receiver name (no trailing path segments). This ensures extractReceiverName correctly handles the absence of a trailing slash.

x-pack/plugin/otel-data/src/test/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndexTests.java (3)

23-37: LGTM! Test correctly validates explicit index precedence.

The test verifies that when elasticsearch.index is set, it takes precedence over data_stream.* attributes, and the result is a non-data-stream target index.

39-53: LGTM! Test validates data stream attribute handling and sanitization.

The test correctly verifies:

Dataset sanitization: custom-dataset → custom_dataset (hyphen replaced), then .otel suffix added.

Namespace sanitization: custom-namespace remains unchanged (hyphens allowed in namespace).

55-142: LGTM! Comprehensive test coverage for TargetIndex evaluation logic.

The test suite thoroughly validates:

Scope and resource attribute handling

Attribute precedence rules (attributes > scope > resource)

Receiver-based routing

Default value behavior

Sanitization of uppercase, hyphens, and special characters

All assertions align with the expected behavior of TargetIndex.evaluate and DataStream sanitization methods.

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/DataPointGroupingContext.java (3)

164-194: LGTM! Receiver name extraction logic is sound.

The extractReceiverName method correctly parses the receiver name from the scope name by:

Finding the /receiver/ segment

Extracting the substring between /receiver/ and the next / (or end of string)

Returning null if no receiver pattern is found

Edge cases are handled correctly, including scope names ending with the receiver name (no trailing slash).

220-226: LGTM! TargetIndex.evaluate call is correct.

The call to TargetIndex.evaluate correctly passes:

Type: TargetIndex.TYPE_METRICS ("metrics")

Attributes: data point, scope, and resource attributes in precedence order

Receiver name: extracted from the scope name

This ensures the target index is computed with all relevant context.

172-172: LGTM: String→TargetIndex consistently applied; tests and buildDataStream updated accordingly. Public API change: callers must call .index() or .toString() to get the string representation.

x-pack/plugin/otel-data/src/main/java/org/elasticsearch/xpack/oteldata/otlp/datapoint/TargetIndex.java (2)

21-40: API shape looks good; defaults and caching are practical

The public API and defaultMetrics() caching read well, and the data-stream vs explicit-index split is clear.

58-64: All evaluate() call sites use non-null list arguments
DataPoint.getAttributes()/getScopeAttributes()/getResourceAttributes() never return null and tests use List.of(). Safe to add requireNonNull checks for the list parameters.

coderabbitai · 2025-09-30T21:12:22Z

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java

+    private static final int MAX_LENGTH = 100;
+    private static final String REPLACEMENT = "_";
+    private static final Pattern DISALLOWED_IN_TYPE = Pattern.compile("[\\\\/*?\"<>| ,#:-]");
+    private static final Pattern DISALLOWED_IN_DATASET = Pattern.compile("[\\\\/*?\"<>| ,#:-]");
+    private static final Pattern DISALLOWED_IN_NAMESPACE = Pattern.compile("[\\\\/*?\"<>| ,#:]");


🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

Verify pattern correctness and rationale for identical TYPE and DATASET patterns.

The patterns DISALLOWED_IN_TYPE and DISALLOWED_IN_DATASET are identical, both disallowing [\\/*?\"<>| ,#:-]. However, DISALLOWED_IN_NAMESPACE differs by not disallowing hyphens (only [\\/*?\"<>| ,#:]).

Confirm that:

The regex patterns correctly escape special characters (especially the backslash and forward slash).

TYPE and DATASET should indeed have identical character restrictions, or if there's a functional difference that should be reflected.

The choice to allow hyphens in namespace but not in type/dataset is intentional and documented.

Run the following script to check if these patterns are used consistently across the codebase and verify any related documentation:

🏁 Script executed:

#!/bin/bash # Search for usages of these sanitization methods and any related comments rg -n -C3 'sanitizeType|sanitizeDataset|sanitizeNamespace|DISALLOWED_IN_TYPE|DISALLOWED_IN_DATASET|DISALLOWED_IN_NAMESPACE'

Length of output: 12218

coderabbitai · 2025-09-30T21:12:22Z

server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java

+    private static String sanitizeDataStreamField(String s, Pattern disallowedInDataset) {
+        if (s == null) {
+            return null;
+        }
+        s = s.toLowerCase(Locale.ROOT);
+        s = s.substring(0, Math.min(s.length(), MAX_LENGTH));
+        return disallowedInDataset.matcher(s).replaceAll(REPLACEMENT);
+    }


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Consider edge case: empty or all-invalid input strings.

The sanitization logic correctly handles null input and applies lowercase, length truncation, and character replacement. However, consider these edge cases:

Empty string input: Returns empty string after sanitization, which may not be a valid dataset/namespace/type.

All-disallowed-characters input: For example, "###" becomes "___" after sanitization, which might not be semantically meaningful.

Verify that callers handle empty or sanitized-to-invalid results appropriately, or add validation to reject such inputs.

🏁 Script executed:

#!/bin/bash # Search for callers of these sanitization methods to verify they handle edge cases rg -n -A5 'sanitizeType\(|sanitizeDataset\(|sanitizeNamespace\(' --type java

Length of output: 3450

Validate empty sanitized values
sanitizeDataStreamField returns "" for empty or fully-disallowed inputs, but callers only default on null (e.g. TargetIndex.java 79–89), so empty results slip through. Treat empty sanitized outputs as invalid (null) or add an isEmpty() check before defaulting.

🤖 Prompt for AI Agents

In server/src/main/java/org/elasticsearch/cluster/metadata/DataStream.java around lines 120–127, sanitizeDataStreamField can return an empty string for inputs that are empty or fully-disallowed which callers only treat null as invalid; update the method so that after converting, trimming length and applying the disallowed pattern, if the result is empty (result.isEmpty()) return null instead of an empty string so callers that check for null will correctly treat fully-sanitized values as invalid.

coderabbitai · 2025-09-30T21:12:22Z